36 research outputs found

    Profiling users' behavior, and identifying important features of review 'helpfulness'

    Get PDF
    The increasing volume of online reviews and the use of review platforms leave tracks that can be used to explore interesting patterns. It is in the primary interest of businesses to retain and improve their reputation. Reviewers, on the other hand, tend to write reviews that can influence and attract people’s attention, which often leads to deliberate deviations from past rating behavior. Until now, very limited studies have attempted to explore the impact of user rating behavior on review helpfulness. However, there are more perspectives of user behavior in selecting and rating businesses that still need to be investigated. Moreover, previous studies gave more attention to the review features and reported inconsistent findings on the importance of the features. To fill this gap, we introduce new and modify existing business and reviewer features and propose a user-focused mechanism for review selection. This study aims to investigate and report changes in business reputation, user choice, and rating behavior through descriptive and comparative analysis. Furthermore, the relevance of various features for review helpfulness is identified by correlation, linear regression, and negative binomial regression. The analysis performed on the Yelp dataset shows that the reputation of the businesses has changed slightly over time. Moreover, 46% of the users chose a business with a minimum of 4 stars. The majority of users give 4-star ratings, and 60% of reviewers adopt irregular rating behavior. Our results show a slight improvement by using user rating behavior and choice features. Whereas, the significant increase in R2 indicates the importance of reviewer popularity and experience features. The overall results show that the most significant features of review helpfulness are average user helpfulness, number of user reviews, average business helpfulness, and review length. The outcomes of this study provide important theoretical and practical implications for researchers, businesses, and reviewers

    Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification

    Get PDF
    The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy

    The role of big data in smart city

    No full text
    The expansion of big data and the evolution of Internet of Things (IoT) technologies have played an important role in the feasibility of smart city initiatives. Big data offer the potential for cities to obtain valuable insights from a large amount of data collected through various sources, and the IoT allows the integration of sensors, radio-frequency identification, and Bluetooth in the real-world environment using highly networked services. The combination of the IoT and big data is an unexplored research area that has brought new and interesting challenges for achieving the goal of future smart cities. These new challenges focus primarily on problems related to business and technology that enable cities to actualize the vision, principles, and requirements of the applications of smart cities by realizing the main smart environment characteristics. In this paper, we describe the existing communication technologies and smart-based applications used within the context of smart cities. The visions of big data analytics to support smart cities are discussed by focusing on how big data can fundamentally change urban populations at different levels. Moreover, a future business model that can manage big data for smart cities is proposed, and the business and technological research challenges are identified. This study can serve as a benchmark for researchers and industries for the future progress and development of smart cities in the context of big data

    A Survey on Underwater Wireless Sensor Networks: Requirements, Taxonomy, Recent Advances, and Open Research Challenges

    Get PDF
    The domain of underwater wireless sensor networks (UWSNs) had received a lot of attention recently due to its significant advanced capabilities in the ocean surveillance, marine monitoring and application deployment for detecting underwater targets. However, the literature have not compiled the state-of-the-art along its direction to discover the recent advancements which were fuelled by the underwater sensor technologies. Hence, this paper offers the newest analysis on the available evidences by reviewing studies in the past five years on various aspects that support network activities and applications in UWSN environments. This work was motivated by the need for robust and flexible solutions that can satisfy the requirements for the rapid development of the underwater wireless sensor networks. This paper identifies the key requirements for achieving essential services as well as common platforms for UWSN. It also contributes a taxonomy of the critical elements in UWSNs by devising a classification on architectural elements, communications, routing protocol and standards, security, and applications of UWSNs. Finally, the major challenges that remain open are presented as a guide for future research directions

    JQPro : Join query processing in a distributed system for big RDF data using the hash-merge join technique

    Get PDF
    In the last decade, the volume of semantic data has increased exponentially, with the number of Resource Description Framework (RDF) datasets exceeding trillions of triples in RDF repositories. Hence, the size of RDF datasets continues to grow. However, with the increasing number of RDF triples, complex multiple RDF queries are becoming a significant demand. Sometimes, such complex queries produce many common sub-expressions in a single query or over multiple queries running as a batch. In addition, it is also difficult to minimize the number of RDF queries and processing time for a large amount of related data in a typical distributed environment encounter. To address this complication, we introduce a join query processing model for big RDF data, called JQPro. By adopting a MapReduce framework in JQPro, we developed three new algorithms, which are hash-join, sort-merge, and enhanced MapReduce-join for join query processing of RDF data. Based on an experiment conducted, the result showed that the JQPro model outperformed the two popular algorithms, gStore and RDF-3X, with respect to the average execution time. Furthermore, the JQPro model was also tested against RDF-3X, RDFox, and PARJs using the LUBM benchmark. The result showed that the JQPro model had better performance in comparison with the other models. In conclusion, the findings showed that JQPro achieved improved performance with 87.77% in terms of execution time. Hence, in comparison with the selected models, JQPro performs better

    Credit card default prediction using machine learning techniques

    Get PDF
    Credit risk plays a major role in the banking industry business. Banks' main activities involve granting loan, credit card, investment, mortgage, and others. Credit card has been one of the most booming financial services by banks over the past years. However, with the growing number of credit card users, banks have been facing an escalating credit card default rate. As such data analytics can provide solutions to tackle the current phenomenon and management credit risks. This paper provides a performance evaluation of credit card default prediction. Thus, logistic regression, rpart decision tree, and random forest are used to test the variable in predicting credit default and random forest proved to have the higher accuracy and area under the curve. This result shows that random forest best describe which factors should be considered with an accuracy of 82 % and an Area under Curve of 77 % when assessing the credit risk of credit card customers

    Distributed Join Query Processing for Big RDF Data

    Get PDF
    The expansion of the services of the Semantic Web and the evolution of cloud computing technologies have significantly enhanced the capability of preserving and publishing information in standard open web formats, such that data can be both human-readable and machine-processable. This situation meets the challenge in the current big data era to effectively store, retrieve, and analyze resource description framework (RDF) data in swarms. Moreover, efficient data storage and retrieval that can scale to large amounts of possibly schema-less data have proven to be quite difficult to achieve, specifically, RDF data storage with complex and large graph patterns for representing semantic data, and SPARQL query languages. In this paper, we provide comprehensive discussion about the proposed algorithms of Join.Query processing of RDF data by considering MapReduce Framework in a distributed environment. Moreover, we introduced a framework for RDF query processing and the benchmark that is used for the performance evaluation. Finally, we offer an evaluation discussion on distributed join query processing for big RDF data

    A Community-Based Fault Isolation Approach for Effective Simultaneous Localization of Faults

    No full text
    During program testing, software programs may be discovered to contain multiple faults. Multiple faults in a program may reduce the effectiveness of the existing fault localization techniques due to the complex relationship between faults and failures in the presence of multiple faults. In an ideal case, faults are isolated into fault-focused clusters, each targeting a single fault for developers to localize them simultaneously in parallel. However, the relationship between faults and failures is not easily identified and depends solely on the accuracy of clustering, such as existing clustering algorithms are not able to isolate failed tests to their causative faults effectively which hinder localization effectiveness. This paper proposes a new approach that makes use of a divisive network community clustering algorithm to isolate faults into separate fault-focused communities that target a single fault each. A community weighting and a selection mechanism that aids in prioritizing highly important fault-focused communities to the available developers to debug the faults simultaneously in parallel is also proposed. The approach is evaluated on eight subject programs ranging from medium-sized to large-sized programs (tcas, replace, gzip, sed, flex, grep, make, and ant). Overall, 540 multiple-fault versions of these programs were generated with 2-5 faulty versions. The experimental results have demonstrated that the proposed approach performs significantly better in terms of localization effectiveness in comparison with two other parallel debugging approaches for locating multiple faults in parallel

    Artificial Intelligence applications in healthcare: A bibliometric and topic model-based analysis

    No full text
    Artificial Intelligence (AI) has emerged as a leading technology that can significantly enhance healthcare systems., including diagnosis and treatment recommendations, patient engagement and adherence, and health predictions, because of recent developments in digitized data acquisition, cloud computing, IoT, and Machine learning. In this study, we conducted a bibliometric analysis to evaluate the trend of healthcare applications' research assessment publications indexed in Scopus from 1991 to 2022. A biblioshiny program was used for data visualization to produce distance- and graph-based maps. Moreover, the study presented a unique set of topics and terms that correlate with certain areas related to AI. using the popular Latent Dirichlet Allocation technique. A Corpus of 2,335 articles from 8,536 authors were analysed. The top 20 journals have been extracted to provide the recent trends in healthcare applications concerning AI Results reveal shifting trends in AI and its applications in healthcare. Certain areas of machine learning and deep learning are gaining momentum while others are diminishing.Artificial intelligence (AI) has transformed modern healthcare since its 1950s inception. AI, particularly machine learning, has enriched disease prediction, diagnosis, and treatment, benefiting patients and healthcare providers. This paper presents a comprehensive analysis of AI's current healthcare research landscape. Employing bibliometric analytics, it explores document trends, top sources, influential countries, dynamic keywords, and emerging research topics.The study highlights the United States as a dominant force in AI healthcare research, with over 5,000 citations. Keyword analysis reveals the shift from fuzzy logic to deep learning, signifying its increasing relevance. Deep learning research surged, reaching 616 publications in 2021. The analysis identifies common keywords in AI healthcare articles.Moreover, using the popular Latent Dirichlet Allocation technique, the study presented a unique set of topics and terms that correlate with certain areas related to AI. A Corpus of 2,335 articles from 8,536 authors were analysed.While limitations exist, such as the need for broader databases like the Web of Science, this study underscores AI's evolving role in healthcare. It demonstrates AI's potential to revolutionize patient care and healthcare operations, laying the foundation for future innovations
    corecore